Inferring circRNA-drug sensitivity associations via dual hierarchical attention networks and multiple kernel fusion

Increasing evidence has shown that the expression of circular RNAs (circRNAs) can affect the drug sensitivity of cells and significantly influence drug efficacy. Therefore, research into the relationships between circRNAs and drugs can be of great significance in increasing the comprehension of circRNAs function, as well as contributing to the discovery of new drugs and the repurposing of existing drugs. However, it is time-consuming and costly to validate the function of circRNA with traditional medical research methods. Therefore, the development of efficient and accurate computational models that can assist in discovering the potential interactions between circRNAs and drugs is urgently needed. In this study, a novel method is proposed, called DHANMKF , that aims to predict potential circRNA-drug sensitivity interactions for further biomedical screening and validation. Firstly, multimodal networks were constructed by DHANMKF using multiple sources of information on circRNAs and drugs. Secondly, comprehensive intra-type and inter-type node representations were learned using bi-typed multi-relational heterogeneous graphs, which are attention-based encoders utilizing a hierarchical process. Thirdly, the multi-kernel fusion method was used to fuse intra-type embedding and inter-type embedding. Finally, the Dual Laplacian Regularized Least Squares method (DLapRLS) was used to predict the potential circRNA-drug sensitivity associations using the combined kernel in circRNA and drug spaces. Compared with the other methods, DHANMKF obtained the highest AUC value on two datasets. Code is available at https://github.com/cuntjx/DHANMKF. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-023-09899-w.


Introduction
Circular RNA (circRNA) is a unique type of RNA that differs from other RNAs in that it forms a covalently closed loop and is typically considered non-coding.With the advancement of high-throughput genomics technology, circRNA has become a hot topic in RNA biology research [1].Since the discovery of the first circRNA in RNA viruses in the 1970s [2], the advancement of biomedical technology has resulted in the discovery of an increasing amount of circRNAs.However, research into circRNA function has progressed very slowly over several decades, until 2013, when Memczak et al. and Hansen et al. proved that the circular RNA of human cerebellar degeneration-related protein has an important function in neural development [3,4].This discovery led to a great increase in the study of circRNA function.The most notable function of circRNAs is that they act as miRNA sponges, which regulates target gene expression by inhibiting miRNA activity.One circRNA can regulate one or multiple miRNAs through multiple miRNA binding sites in a circular sequence [5].Previous studies have found that circRNA can regulate alternative splicing or transcription [6,7], as well as parental gene expression [8,9].The results of these studies have also indicated that cir-cRNA plays an important role in physiological and pathological processes, and that the dysregulation of circRNA is closely related to many human diseases [10].over the past two decades, several verified biological function experiments have shown that circRNA has potential as a new clinical diagnostic marker.
Over the years, an increasing number of studies have demonstrated that circRNA can significantly affect the drug sensitivity of cells.For example, Gao et al. [11] screened 18 circRNAs from 3093 circRNAs and then verified them in real-time by quantitative reverse transcription PCR.Finally, hsa_circ_0006528 was found to play an important role in chemotherapy resistance in breast cancer patients.Peng et al. [12] first used nextgeneration sequencing (NGS) technology to identify the comprehensive circRNA expression profile of multidrug-resistant osteosarcoma(OS) cell lines and found that hsa_circ_ 0004674 was significantly elevated in OSresistant cells and tissues, and was associated with poor prognosis.This was then verified by quantitative realtime PCR (qRT-PCR).A study by Wu et al. [13] found that hsa_circ_0001546 is decreased in gastric cancer, which is associated with poor prognosis and also inhibits drug resistance via the ATM/Chk2/p53-dependent pathway.Ruan et al. [14] used four identification algorithms to describe the expression profile of circRNA in approximately 1000 human cancer cell lines and observed a strong correlation between circRNA expression and drug response.That study systematically demonstrated the effect of circRNAs on drug sensitivity.However, research into the relationship between circRNA and drug sensitivity is a newly emerging field that has developed rapidly over the past decade, so our understanding of this relationship is still in its early stages.
The process of validating the relationships between circRNA and drug sensitivity using traditional biomedical methods is time-consuming and costly.Therefore, some researchers have developed computational models that can help to reveal the potential relationships between circRNA and drugs.For example, Deng et al. [15] proposed a computational model called GATECDA for predicting the association between circRNA and drug sensitivity.GATECDA is based on the Graph Attention Auto-encoder(GATE) [16].First, sequence information data for circRNAs, structural data for drugs, and cir-cRNA-drug sensitivity association data were collected.Then the similarity between circRNAs and drugs were each calculated and these data as well as circRNA-drug sensitivity association data were input into the GATE, in order to generate low-dimensional vector representations of circRNA and drug nodes.Finally, the low-dimensional vector representations generated by the GATE were input into a fully connected neural network for circRNA-drug sensitivity association prediction.Later, Yang et al. [17] proposed a model called MNGACDA.The model constructs a multimodal network based on multiple information sources on circRNAs and drugs.Then, a node-level attention Graph Auto-Encoder was used to obtain lowdimensional embeddings of circRNAs and drugs from the multimodal network.Finally, the low-dimensional embeddings of circRNAs and drugs were input into an inner product decoder to score the association between circRNAs and drug sensitivity.To our knowledge, these were the first models to apply computational methods to predict the potential association between circRNAs and diseases.Thus far, no other new models have been applied in this field, and considerable advancement in the creation of new and improved models for this field of research is much needed.
Since Multiple Kernel Learning (MKL) [18] was proposed, it has been widely applied to bipartite biological networks for the improvement of model performance.Specifically, the information contained in the samples were used by MKL to compute the multiple kernel matrix, and then the optimal kernel matrix was obtained by fusing multiple kernel matrices.For example, MKGCN, which is based on MKL and GCN [19], was proposed by Yang et al. to infer novel microbe-drug associations.Yan et al. [20] proposed a computational methods called MKLC-BiRW, based on MKL and Bi-random walk algorithm, to predict potential drug-target interactions by integrating diverse drug-related and targetrelated heterogeneous information.
In this study, we propose a novel method, called DHANMKF, that aims to predict potential circRNA-drug sensitivity associations for further biomedical screening and validation.Firstly, multimodal networks were constructed by DHANMKF using multiple sources of information on circRNAs and drugs.Secondly, comprehensive intra-type and inter-type node representations were learned using multi-relational heterogeneous graphs, which are attention-based encoders under a hierarchical process.Thirdly, a multi-kernel fusion method was used to fuse intra-type embedding and inter-type embedding.Fourthly, the Dual Laplacian Regularized Least Squares (DLapRLS) method was used to predict the potential circRNA-drug sensitivity associations by the combined kernel in circRNA and drug spaces.In order to evaluate the effectiveness of DHANMKF, it was compared with six state-of-the-art methods on a benchmark data set under 5-fold cross-validations (5-CV).Compared with the other methods, DHANMKF obtained the highest AUC.Furthermore, an ablation study was performed to compare the experimental results from different perspectives.Finally, case studies were conducted to demonstrate that the DHANMKF model can be a useful tool for helping with the study of circRNA-drug sensitivity associations in real situations.To the best of our knowledge, DHANMKF is the first algorithm to use dual hierarchical attention networks for the prediction of circRNAdrug sensitivity associations.Our main contributions, differing from previous approaches, are summarized as follows: (1) We classify nodes into two types, i.e., head nodes and tail nodes, based on the degree of the nodes, and then define the types of edges based on the associations between different kinds of nodes.(2) Based on the differences in types of edges, we use dual hierarchical attention networks to extract the information on circR-NAs and drugs, use the multi-kernel fusion method to fuse this information, and then use the dual graph regularized least squares method to predict potential cir-cRNA-drug associations.(3) We tested DHANMKF on two datasets, and the results show that multi-relational dual hierarchical attention networks perform better than the other methods in predicting potential circRNA-drug associations.These results can provide new insights for further research on circRNA-drug associations.

Datasets
Two datasets, data271 and data251, were used in this study.Data271 is from Deng et al. [15] and data251 is from Deng et al. [15] and Peng et al. [21].CircRNAdrug sensitivity associations were collected from the circRic database [14] by Deng et al. [15], where drug sensitivity data were obtained from the GDSC database [22].After Wilcoxon tests with a false discovery rate < 0.05 , these significant circRNA-drug sensitiv- ity associations were extracted as the data271 dataset, which contains N c = 271 circRNAs, N d = 218 drugs and 4134 circRNA-drug sensitivity associations.Integrating with the dataset of Peng et al. [21], we removed circR-NAs with host-gene interaction scores ≤ 0.5 and nodes with a degree of 0. This resulted in the data251 dataset, containing N c = 251 circRNAs, N d = 217 drugs, and 3635 circRNA-drug sensitivity associations.Additional information on these two datasets can be found in the Supplementary file.In our experiment, circRNAs and drugs were represented as two different types of nodes in the network.The node set of N c circRNAs was defined as C = {c 1 , . . ., c N c } .Similarly, the node set of N d drugs was described as D = {d 1 , . . ., d N d } .An adjacency matrix Y ∈ R N c ×N d was created for the storage of circRNA-drug associations.In this matrix, N c rows represent the num- ber of circRNAs and N d columns represent the number of drugs.If circRNA c i (1 During the training phase, all the Y ij = 1 are treated as positive samples and the others are treated as negative samples.We randomly masked some positive samples from Y to get Y train .In order to calculate the similarity of circR- NAs and drugs, the host gene sequences of circRNAs were downloaded from the National Center for Biotechnology Information (NCBI) gene database [23] and the drug structure data were downloaded from NCBI's PubChem database [24].

Sequence similarity of host genes of circRNAs
Applying methods similar to those of Deng et al. [15] and Yang et al. [17], we treated the sequence similarity between host genes of circRNAs as the similarity between circRNAs.In this way, the similarity calculation between circRNAs became the sequence similarity calculation between host genes of circRNAs.The sequence similarity between host genes of circRNAs was calculated based on the sequence Levenshtein distance, which was obtained using the ratio function of Python's Levenshtein package.A similarity matrix CSS ∈ R N c ×N c was created for storing the circRNA sequence similarity.

Structural similarity of drugs
The structure of drugs has a great impact on their function.Therefore, it has become a common practice to measure the similarity of drugs based on their structure.As in previous studies [25,26], RDKit [27] toolkit and the Tanimoto method were used to calculate the structural similarities between drugs.The specific process was as follows: first, the structural data on several drugs were obtained from the PubChem database.Then, RDKit was used to calculate the topological fingerprint of each drug.After that, the structural similarity between drugs was calculated using the Tanimoto method.Finally, the drugs structural similarity matrix DSS ∈ R N d ×N d was derived.

Gaussian interaction profile kernel similarity for circRNAs and drugs
The Gaussian Interaction Profile (GIP) kernel similarity [28] algorithm is a collaborative filtering algorithm that has been widely used in previous studies for similarity calculation [29,30], and it helps to obtain topological information on circRNAs and drugs in relational graphs.Therefore, we calculated the GIP kernel similarity for circRNAs and drugs using the circRNA-drug association network.Firstly, based on the assumption that similar circRNAs are more likely to be associated with similar drugs, we utilized a binary vector BI(c i ) , which is the ith row of the Y train matrix, representing the asso- ciations between circRNAs c i and all drugs in the training matrix of Y .Then, the GIP kernel similarity for circRNAs CGS(c i , c j ) between circRNA c i and c j was calculated as below: Here, α c has been set to 1 referring to [28]'s studies.And similarly, we calculated the GIP of drug DGS(d i , d j ) between drugs d i and d j as follows: Here, the binary vector BI(d i ) is the ith column of the Y train matrix, representing the associations between drugs d i and all circRNAs in the training matrix of Y .α d has been set to 1 referring to [28] studies.

Integrated similarity for circRNAs and drugs
Inspired by the study of Wang et al. [31], we used a nonlinear fusion method to integrate the circRNA similarity and the drug similarity.With circRNA similarity, for example, we first normalized the sequence similarity of host genes of circRNAs using the following formula: Then, the K Nearest Neighbors (KNN) algorithm was used to measure CSS 's local affinity as follows: 6) is the set of KNN of c i , including c i in CSS .This operation is based on the assumption that the higher the local similarity, the more reliable it is.Therefore, the near-end similarity is high while the far-end similarity ( 1) (2) ( ( is set to 0. Similarly, we repeated the process for CGS and then we obtained CGS ′ and CKNN2 .After that, we updated the similarity matrix for each kind of data as follows: After each iteration, CSS ′ (t+1) is normalized by formula Eq. ( 5).Similarly, CGS ′ (t+1) performs the same normalization.The iteration does not stop until the convergence condition is met, and the convergence condition is met when the relative change in �CSS is less than 10 −6 .Assuming that the process involves t iterations, the overall comprehensive similarity matrix of circRNA can be obtained by Eq. ( 9) when the iteration ends.
Based on these rules, the similarity matrix S c is an asymmetry matrix.Therefore, we calculated the as the circRNA comprehensive similarity matrix.For drugs, we applied the same rules to DSS and DGS , then we obtained the comprehensive drug similarity matrix S d .

DHANMKF
Dual Hierarchical Attention Networks (DHAN) were proposed by Zhao et al. [32] in 2022.Comprehensive node representations are learned with intra-type and inter-type attention-based encoders using a hierarchical process based on the bi-typed multi-relational heterogeneous graphs in DHAN.Specifically, DHAN uses two encoders, one to aggregate information on nodes of the same type and the other to aggregate node representations of different type neighbors.Then, the complex structure of the bi-typed multi-relational heterogeneous graph is captured by the model by a hierarchical process and dual-level attention operation.It is worth noting that the association matrix Y of circRNA-drug is a bi-typed single-relation heterogeneous graph.Therefore, in order to fully utilize the extraction ability of DHAN for node embedding, it is necessary to classify the relationships between nodes.
It is well-known that the adjacency matrix describing different objects in the biomedical field is sparse.This means that there are many nodes with small degrees.Histograms of the degree distributions of circRNAs and drugs can be found in the Supplementary file.It can be seen that most of the nodes have small degrees regardless (7 2 .
of whether they are circRNA nodes or drug nodes.Intuitively, the biomedical significance of a drug being associated with only a few circRNAs or a drug being associated with many circRNAs are different.Inspired by Liu et al. [33], we categorized the nodes into head nodes and tail nodes according to the value of their degrees.That is, for every node v ∈ V , where V is the set of nodes in a graph.N v is denoted by the set of neighboring nodes of v, and the number of elements in set N v is defined as the degree of v.Here we let V h and V t denote the set of head and tail nodes, respectively.For some threshold K, we define tail nodes as nodes with a degree not exceeding K, i.e., K is treated as a hyperparameter in our study.In this way, the association of circRNAs with drugs in dataset data271 changes from being one type to being the following four types.
1. Association between the head node of circRNA and the head node of the drug.2. Association between the head node of circRNA and the tail node of the drug.3. Association between the tail node of circRNA and the head node of the drug.4. Association between the tail node of circRNA and the tail node of the drug.
Because the node types in the circRNA similarity network and the drug similarity network are the same, that is, they are either all circRNA or all drugs, the following three types of associations will be in these two similarity networks.
1. Associations between the head nodes.2. Associations between head node and tail node.

Associations between tail nodes.
In summary, we represent intra-type relationships and inter-type relationships as R intra = {1, 2, 3} and R inter = {1, 2, 3, 4} , respectively.Whereas in the data251 dataset, circRNAs were split into two types depending on whether the host gene of the circRNA was associated with a disease or not, which is analogous to splitting cir-cRNAs into head nodes and tail nodes.Thus the same number of edge types can also be obtained, and the definition is more biologically meaningful in this way.

Intra-type attention-based encoder
After the computational process above, the associated network of circRNA and drug becomes a bi-type multi-relationship heterogeneous network, given a node pair (n i , n j ) ∈ C that are connected via node intra-type relationship � k ∈ R (c) intra = {1, 2, 3} .Firstly, we initialized the representation matrix of circRNA to ) is the feature vector of the node n i .Secondly, self-atten- tion was performed on the circRNA nodes to formulate the importance e k ij of a specific-relation based node pair (n i , n j ) as follows: Where || denotes the concatenate operation, and a T k ∈ R 2d ′ ×1 denotes the shared node-level attention weight vector under relation k .LeakyRelu is the nonlinearity activation function, which is widely used in attention-based neural networks.In the third step, e k ij is standardized using the Eq. ( 11) to facilitate comparison of importance between different nodes.
Where N � k intra (n i ) denotes specific relation-based neighbors of n i , the embedding h k 1 of node n i under given relation k is obtained as follows: Where Norm k denotes the relation-specific layer normalization operation; h k i is semantic-specific.Therefore, by using Eq. ( 12) to fuse the aggregated information of nodes with different specific relations, more comprehensive node embeddings can be obtained as follows: Where q ∈ R 2d ′ ×1 is a trainable parameter.Similar to Eq. ( 11), we standardize g k i by using the softmax function as follows: Here β � k ij is used to measure the local importance of intra-relation k .Finally, the intra-type attention-based representation of circRNA node n i can be obtained as follows: (10) e G denotes how important intratype l is for all circRNA nodes and can be regarded as a global importance parameter.The global and local importance of the intra-type relationship l is smoothed by the parameter t.Both β φ l G and t can be learned from training.The aggregated information for node n i under intra-type relation l is represented by h l i .Initialize the representation matrix of drug to is a learnable parameter and Using the same process above, we can get the intra-type attention-based representation of drug node n i , which can be represented as T respectively represent the first layer output of the intratype attention-based encoder, that is, the node embedding matrix of circRNAs and drugs.Assuming that the intra-type attention-based encoder has t layers, the output of the previous layer is taken as the input of the next layer.Repeating this process can obtain t node embedding matrices about circRNA and drugs as follows:

Inter-type attention-based encoder
The purpose of the intra-type attention-based encoder is to learn node embeddings by aggregating the node information of the same type neighbors, while the purpose of the inter-type attention-based encoder is to handle interactions between different types of nodes.Let n i ∈ C and n j ∈ D , respectively.z c i and z d j are the learned representations of the circRNA node n i and drug node n j by intra- type attention networks, respectively.The node-level importance c m ij can be calculated by Eq. ( 16) and normalized by Eq. ( 17) as follows: inter (n i ) denotes the neighbors of node n i under spe- cific inter-relation m .W c inter and W d inter ∈ R d ′ ×d ′ are two type-specific matrices that map their features z c i and z d i into a common space.a m ∈ R 2d ′ is a learnable weight vector.The relationship embedding of circRNA node n i can be aggregated from the embeddings of its neighbors (15) .
of different types(that is, the nodes of a drug), with corresponding coefficients as follows: Norm m denotes the layer normalization operation related to the inter-type relation.Then, the importance of relation embedding z m i related to node n i are obtained by fusing all relational representations by Eq. ( 19), and it is normalized by Eq. ( 20) for making relation importance comparable within inter-type relations.
Finally, the representation u i of circRNA node n i is obtained by fusing these relation-specific representations as follows: Similarly, we can get the inter-type attention-based representation of drug node n j , which can be rep- resented as T respectively represent the first layer output of the inter-type attention-based encoder, that is, the node embedding matrix of circR-NAs and drugs.Assuming the inter-type attention-based encoder has M layers, the output of the previous layer is taken as the input of the next layer.Repeating this process can obtain M node embedding matrices about cir-cRNA and drugs as follows:

Multi-kernel fusion
We can extract multiple embeddings from the intratype attention-based encoder and the inter-type attention-based encoder that represent the information on circRNA nodes and drug nodes of different types and different relationships.For all the embeddings of circRNA and drug, we used the GIP kernel similarity function to calculate the circRNA and drug kernel matrices in each layer as follows: (18) Where We integrated all the kernels above with multiple kernel fusion in order to fully utilize the information and improve the performance of predicting circRNA-drug associations, then the final kernel matrices of circRNA and drug were obtained as follows: , and are the corresponding weight of circRNA kernels and drug kernels, respectively.

Dual Laplacian regularized least squares model
Inspired by previous studies [34] and [35], the Dual Laplacian Regularized Least Squares (DLapRLS) method was adopted by us to predict circRNA-drug associations.Overfitting was avoided by adding graph regularization with DLapRLS.Thus, the loss function can be defined as follows: Where � • � F is the Frobenius norm, α c and α T d ∈ R N c ×N d are learnable matrices, φ c and φ d are regularization param- eters.;L c ∈ R N c ×N c and L d ∈ R N d ×N d are normalized Laplacian matrices, as follows: i=1 ID are diagonal degree matrix.Finally, the prediction F for circRNA-drug associations from IC and ID is obtained as follows: (24

Training
Except for parameters α c and α d , the parameters of our model are updated by Adam [36].The parameters of α c and α d are updated by calculating the partial derivatives for the parameters of DLapRLS.The specific calculation process is as follows: we first assume that α d is a constant matrix when α d is optimized.Thus, the partial derivative of the loss func- tion Eq. ( 26) with respect to α c can be calculated as follows: Let ∂J ∂α c = 0 , then α c can be obtained as follows: Similarly, the partial derivative of the loss function Eq. ( 26) with respect to α d can be calculated as follows: Same as above, we let ∂J ∂α d = 0 , and then α d can be obtain as follows: α c and α d were randomly initialized at the beginning of our model training, and then they were calculated by Eqs.(31) and (33) directly in each iteration, while other parameters were optimized by Adam.The flowchart of our model is shown in Fig. 1.All experimentally verified circRNA-drug associations were treated as positive samples, and the unknown circRNA-drug associations were treated as negative samples, similar to the work of Deng et al. [15] and Yang et al. [17].Then, the same number of negative samples were randomly selected from all the the unknown circRNA-drug associations.Finally, the same number of positive and negative samples were selected for training.

Implementation details and performance evaluation
The model used in this study was implemented based on PyTorch and PyG, and we evaluated the predictive (29) performance of our model using 5-fold cross-validation (5CV).The training epochs were set to 40, the learning rate to 0.05 and the weight decay to 0.01.The number of layers for both the intra-type attention-based encoder and the inter-type attention-based encoder were set to 1 and the output dimensions were both set to 16.The thresholds for distinguishing the head and tail nodes of circRNAs and drugs were set at 27 and 39, respectively.Multi-headed attention was set to 5, and the remaining hyperparameters were set as follows: 75 .During evaluation, we randomly divided all the samples into 5 folds.Four of these folds were used as a training set while the remaining fold was treated as a test set.Seven metrics are used to compare model performance: AUC, AUPR, Accuracy, Precision, Recall, F1-Score, and Specificity.It is well-established that improved model performance is reflected by higher AUC and AUPR values.F1-Score is the average of accuracy and recall, while specificity measures the ability of the classifier to correctly identify negative cases.

Performance comparison with other methods under 5-CV
The current computational methods for predicting cir-cRNA-drug sensitivity associations are restricted.We found that GATECDA [15] and MNGACDA [17] are specifically designed for predicting circRNA-drug sensitivity associations.Thus, like Ref. [15] and Ref. [17], we compared our model with seven state-of-the-art models from different domains, namely MNGACDA [17], GATECDA [15], MINIMDA [37], LAGCN [38], MMGCN [39], and GANLDA [40] .Brief descriptions of these models are provided below: • MNGACDA [17] : a computational framework for predicting circRNA-drug sensitivity associations.This model uses multimodal networks to learn the embedded representations of circRNAs and drugs, then captures the internal information between nodes in the networks with node-level attention Graph Auto-Encoder.• GATECDA [15] : a computational model based on Graph Attention Auto-encoder (GATE) for predicting circRNA-drug sensitivity associations.
Fig. 1 The overview of our proposed method • MKGCN [35] : a computational model based on GCN and MKL for predicting microbe-drug associations.• MINIMDA [37] : a method of predicting miRNAdisease associations by constructing integrated similarity networks and using multimodal networks to obtain embedding representations of miRNAs and diseases.These representations are then fed into a multilayer perceptron for prediction.• LAGCN [38] : LAGCN integrates various associations into a heterogeneous network, learns embeddings of drugs and diseases by Graph Convolution operations, and then combines multiple layers by using an attention function.
• MMGCN [39] : MMGCN differs from simple multisource integration in that it uses a GCN encoder to obtain miRNA and disease features in different similarity views and enhances the learned representations for association prediction by using multichannel attention that adaptively learns the importance of different features.• GANLDA [40] : this method combines heterogeneous data of lncRNA and disease as original features and reduces noise by using Principal Component Analysis (PCA).Then the Graph Attention Network is used to extract information from the features.Finally, a multi-layer perceptron is used to predict lncRNA-disease associations.
The prediction performance of each method was evaluated by a 5CV experiment using the same settings and optimal parameters recommended in their respective studies.From Table 1, it can be seen that DHANMKF achieved the highest AUC and AUPR values.This indicates that DHANMKF performed better overall compared to the other models.

Evaluation of parameters
The prediction performance of DHANMKF is affected by various parameter values.The parameters of DHAN-MKF can be divided into four parts: the parameters in the inter-type attention-based encoder and the intratype attention-based encoder, bandwidth parameter γ in MKF, regularization parameters ( φ c and φ d ) in DLapRL, and degree threshold parameters ( K c and K d ) for distin- guishing circRNA and drug nodes as head and tail nodes.
Here, the process of parameter evaluation is demonstrated using data271 as the baseline dataset.The parameter settings of DHANMKF on the data251 dataset have been put into the Supplementary file.

Optimizable parameters in the intra-type attention-based encoder and the inter-type attention-based encoder
• Learning rate and its weight decay.Learning rate and its weight decay are the same in the intra-type attention-based encoder and the inter-type attention-based encoder.Based on the research conducted by Zhao et al. [32], we set them as 0.05 and 0.01 respectively.• Dropout and number of model training epochs.We selected the dropout to be {0.02,0.021, . . ., 0.03 }.
When the value of the dropout is 0.026, the model performance reaches its optimum, and with an increase in the value of the dropout, the model performance gradually declines.The loss of DHANMKF started converging at 40 training epochs, so the number of epochs for our model was set to 40.• The number of attention heads.To have a more powerful representation learning capacity, the multi-head attention mechanism was incorporated into the model.This parameters was tuned using 5CV.As shown in Fig. 2A, when the number of attention heads is equal to 5, the model performance reaches its optimum.• The output dimensions.We analyzed the output dimensions of the intra-type attention-based encoder and inter-type attention-based encoder, as shown in Fig. 2B.When the output dimension was 16, the AUC performance was best.• The number of layers of the intra-type attention-based encoder and the inter-type attention-based encoder.
As shown in Fig. 3A, when the number of layers of the intra-type attention-based encoder and the intertype attention-based encoder are both 1, the AUC of DHANMKF reaches its optimal value.

Optimizable parameters in MKF and DLapRL
• The bandwidth parameter γ in MKF is actually the 1 2σ 2 of the Gaussian kernel function, that is, γ = 1 2σ 2 .Parameter σ determines the smoothness of the Gaussian filter.The larger σ is, the smoother it is.Therefore, by adjusting γ , a compromise can be reached between over-smoothing and undersmoothing.As shown in Fig. 3B, when γ = 1 75 , the AUC of DHANMKF reaches its optimal value.• The parameters φ c and φ d play a regulating role in DLapRL, and they can be adjusted to balance underfitting and overfitting.From Fig. 4A, we can see that the AUC of the model reaches its maximum when φ c and φ d are both 1 120 .

Optimization of head and tail node thresholds
The threshold of the head and tail nodes can adjust the number of head and tail nodes and the number of associated types, thus affecting the embedding of corresponding nodes.As shown in Fig. 4B, the maximum AUC value of the model is achieved when the thresholds of the circRNA node and drug node are 27 and 39, respectively.

Ablation tests
Ablation experiments were conducted from two perspectives: 1. Analyzing the importance of the intra-type attention-based encoder and the inter-type attention-based encoder; 2. Analyzing the effects of multiple relationships.Therefore, we constructed three ablation experiments.The first one is called DHANMKF-intra, which means that DHANMKF removes the embedding produced by the intra-type attention-based encoder when doing Multi-Kernel Fusion.The second one is called DHANMKF-inter, which means that DHANMKF removes the embedding produced by the inter-type attention-based encoder when doing multi-core fusion.

Fig. 2 DHANMKF's attention heads and output dimensions
The third one is called DHANMKF-multi, which means that the model no longer divides the relationships between nodes into multiple categories.
Table 2 shows the comparison results of the 5CV.From Table 2 we can see that DHANMKF performs better than all other models.This shows that: In summary, there are two main reasons why DHAN-MKF can outperform other models.The first reason is that our model can fully capture the complex structures

Case studies
To further evaluate the predictive performance of our model, we selected two drugs, PAC-1 and Vorinostat, for case studies.Similar to Deng et al. [15] and Yang et al. [17], we used the circRNA-drug associations in the GDSC database as the training set and those in the CTRP database as the testing set.For each drug, we chose the top 20 cir-cRNAs with the highest predicted scores from our model's circRNA-drug association prediction outputs for validation.PAC-1 is the first known small molecule drug that directly activates procaspase-3 to caspase-3 [41].It not only enhances procaspase-3 activity but also induces cancer cell apoptosis.Vitro experiments have shown that PAC-1 exhibits cytotoxicity against lymphoma, multiple myeloma, and many other cancer cells [42].Currently, PAC-1 has been used in clinical trials for the treatment of various tumors, including but not limited to lymphoma, melanoma, solid tumors, breast cancer, and lung cancer [43].As shown in Table 3, among the top 20 circRNAs predicted by our method to be associated with PAC-1, 16 have been identified in CTRP.
Belinostat is a small-molecule hydroxamate-type inhibitor that can inhibit the activity of class I, II and IV histone deacetylase enzymes.It has been used to treat relapsed or refractory peripheral T-cell lymphoma [44].Table 4 shows that 17 of the top 20 circRNAs predicted by our method have been confirmed in circRic.
In order to demonstrate the performance of DHAN-MKF in predicting the potential association between new drugs and circRNA, we chose two drugs for ab initio testing, both of which had only one known cir-cRNA-drug association.During the training phase, we removed the unique association between these two drugs and circRNA.At this point, these two drugs were not associated with any circRNAs and were treated as new drugs during training.These two drugs were Bortezomib and MS-275 (Entinostat).Bortezomib is a novel proteasome inhibitor with potent chemo/radio-sensitizing effects that can overcome the traditional resistance of tumors when used in combination with chemotherapy [45].In addition, existing clinical applications have shown that Bortezomib can improve clinical outcomes in the treatment of hematologic malignancies [46].
MS-275, also known as Entinostat, is effective in human leukemia cells and lymphoma cells.It can reduce the level of Bcl-XL in cells, induce p21 protein expression, cause cell cycle arrest (G1 phase), and induce cell apoptosis [47].In addition, when used in combination with other drugs, entinostat can enhance the activity of some anticancer drugs, including Rituximab, Gemcitabine, Doxorubicin, Sorafenib and Bortezomib.Currently, Entinostat is undergoing phase III clinical trials and its clinical data shows that it has great potential for treating breast cancer [48].
As shown in Table 5, 6 of the top 10 predicted circR-NAs associated with Bortezomib have been confirmed in circRic, and 7 of the top 10 circRNAs related to MS-275 have been confirmed in circRic.

Conclusions
Recent research over the past twenty years has shown that circRNA plays an important role in drug sensitivity.Therefore, predicting the potential association between circRNA and drug sensitivity can be helpful in drug development and utilization, thus benefiting  patients.In this study, we proposed a method, based on intra-type attention and inter-type attention called DHANMKF, for discovering potential circRNA-drug sensitivity associations.To verify the effectiveness of the model, DHANMKF was compared with six stateof-the-art methods based on 5CV on benchmark datasets.The results showed that DHANMKF achieved the best performance.In addition, to further evaluate the ability of the model to discover new drugs, a case study was conducted and the model's prediction results were validated using an independent database.The validation results clearly demonstrate that DHANMKF is an effective tool for predicting new circRNA-drug sensitivity associations.
The results show that our model outperforms the baseline models.We believe the main reasons are the following: (1) We classify the nodes into head and tail nodes, which in turn defines the types of edges connecting these two types of nodes.This allows our model to extract node embeddings from the circRNA-drug heterogeneous graph based on different types of edges.The MKL method fuses the multi-relational heterogeneous graph information captured by the two encoders in order to improve the overall performance of the model.In future studies, we plan to integrate more biomedical data, in order to generate more comprehensive circRNA and drug kernels and further improve model performance.Currently, there are few studies that use computational methods to predict potential associations between circRNA and drug sensitivity, so further investigation in this field is merited.

d
are the l-th element of the circRNA embedding matrix set H c = {H c 0 , Z c 1 , . . ., Z c t , U c 1 , . . ., U c M } and the drug embedding matrix set H d = {H d 0 , Z d 1 , . . ., Z d t , U d 1 , . . ., U d M } , respectively.γ l denotes the corresponding bandwidth, we set γl = γ , l = 1, • • • , K + 1 ,and γ is a hyperparam- eter, and K + 1 is the number of elements in the circRNA embedding matrix set H c and the drug embedding matrix set H d .
(1) Compared with DHANMKF-intra and DHANMKF-inter, DHAN-MKF performs better, which means that the embeddings produced by the intra-type attention-based encoder and the inter-type attention-based encoder improve the performance of the model.(2) Compared with DHAN-MKF-multi, DHANMKF can generate node embeddings corresponding to different relationships between nodes.

Fig. 3 Fig. 4 φ
Fig. 3 DHANMKF's layers and γ (2) The intra-type attention-based encoder can efficiently aggregate information from nodes of the same type.(3) The inter-type attention-based encoder adequately extracts node representations from different types of nodes .(4)

Table 1
Performance comparison based on five-fold cross-validation

Table 2
Ablation experimentof the bi-typed multi-relational heterogeneous graphs.The second reason is that biological networks in reality are sparse, so it is reasonable to divide the nodes in biological networks into head nodes and tail nodes for analysis.

Table 3
Top 20circRNAs related to PAC-1 predicted by DHANMKF

Table 4
Top 20 circRNAs related to Belinostat predicted by DHANMKF

Table 5
The Top 10 predicted circRNAs associated with the two new drugs Bortezomib and MS-275